A Lookahead Read Cache: Improving Read Performance of Deduplication Storage for Backup Applications

نویسندگان

  • Dongchul Park
  • Young Jin Nam
  • David H.C. Du
چکیده

Abstract—Data deduplication (for short, dedupe) is a special data compression technique and has been widely adopted especially in backup storage systems with the primary aims of backup time saving as well as storage saving. Thus, most of the traditional dedupe research has focused more on the write performance improvement during the dedupe process while very little effort has been made at read performance. However, the read performance in dedupe backup storage is also a crucial issue when it comes to the storage recovery from a system crash. In this paper, we newly design a read cache in dedupe storage for a backup application to improve read performance by taking advantage of its special characteristic: the read sequence is the same as the write sequence. Thus, for better cache utilization, we can evict the data containers with smallest future references from the cache by looking ahead their future references in a moving window. Moreover, To achieve better read cache performance, our design maintains a small log buffer to judiciously maintain future access data chunks. Our experiments with real world workloads demonstrates that our proposed read cache scheme makes a big contribution to read performance improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Read Performance with BP-DAGs for Storage-Efficient File Backup

The continued growth of data and high-continuity of application have raised a critical and mounting demand on storage-efficient and high-performance data protection. New technologies, especially the D2D (Disk-to-Disk) deduplication storage are therefore getting wide attention both in academic and industry in the recent years. Existing deduplication systems mainly rely on duplicate locality insi...

متن کامل

An SRP Target Mode to Improve Read Performance of SRP-Based IB-SANs

SCSI RDMA Protocol (SRP) is used to build high performance Storage Area Networks (SANs) over InfiniBand, or SRP-based IB-SANs for short. The I/O read performance is critical for many read dominant applications, such as multimedia, remote sensing, data backup, etc. However, if I/O accesses focus on a specific storage device of an IB-SAN, the local I/O performance of single device could become th...

متن کامل

Offline Selective Data Deduplication for Primary Storage Systems

Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunat...

متن کامل

Nitro: A Capacity-Optimized SSD Cache for Primary Storage

For many primary storage customers, storage must balance the requirements for large capacity, high performance, and low cost. A well studied technique is to place a solid state drive (SSD) cache in front of hard disk drive (HDD) storage, which can achieve much of the performance benefit of SSDs and the cost per gigabyte efficiency of HDDs. To further lower the cost of SSD caches and increase ef...

متن کامل

Multi Level Caching and Anticipated Parallel Processing-Based Algorithm for Improving the Performance of the Distributed File System

Large amount of data is getting generated due to the extensive use of web applications by billions of users around the globe. The organizations which has deployed web applications are pondering over solutions for scalable storage and faster access of large data. Distributed file systems (DFSs) have been emerged as efficient storage solutions so that the data can be stored and accessed efficient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012